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ABSTRACT 

It seems possible to evaluate language proficiency as 
behavioral ability by: (1) observing authentic language behavior, (2) 
examining the tacit knowledge that underlies language behavior or 
testing specific tasks based on capabilities that are linked to 
implicit knowledge of a language, and (3) testing the acquired 
explicit knowledge, such as use of grammatical rules. A test was 
administered to foreign nationals in the Netherlands who were 
applying to teach their native languages and cultures to 
preschoolers. The test, intended to measure the first two factors, 
consisted of an oral proficiency test of authentic language behavior, 
a standard multiple-choice test, a multiple-choice test of 
orthography and morphology, and a cloze test. Analysis of the data 
found similar results for measurement of language proficiency by 
evaluating language behavior and by testing implicit knowledge. 
Problems with the orthography subtest emphasized the need to consider 
language background when evaluating proficiency. (MSE) 
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J«n van Weeren (C1to. Netherlands) 

0 Introductory reaurks 

In this paper I will present a rather general fraaework 
concerning the Hngulstlr background of language testing I will 

hut that It Is essentially an Instrument with a specific function 
Various aspects of tests will be discussed, audi asthe orobl« 

«lLf? ?I? , i M V or 1-l9rt»t teacner* Is taken as an 

1 The nature of language proficiency 

w« i iVlCr 9 ^* 1 V P™' 1 ' 1 "" <* • '°*1gn language 

we ,e*1y that he or she is able to perform In that language is 
•bit to show actual foreign language behaviour. In ttart tLt h. 

™ 1S when It Is necessary to Measure language 

proficiency. It can be Matured on the basis of actual authentic 
ZTc^:'.! the quality tf^ETS 6 

JhU^wW??*K?tt Vl0Ur - !" 11 U ■ Uo PO"1M«to consider 
^ * b ^* v1 ?" rtl «b111ty as a fona of tacit. Illicit knowledge 

™%^: u £*tz Uc j« n f2 9 ? t b-Mviour ' «fichXrnot 

necessarily need to be evaluated through this behaviour i>w> ,>n 
specify which usks somom -ust be^fe to perf^r. ^ the Ls s 
If his issued tacit knowledge of a language; without these usks 

JhTrt^l^M n~J f "^"^5-itSeS?.^ 
third approadi to this behavioural ability evaluation takes olace 
by virtue of the individual's explicit ^gdgTa^t a "nSwoe 

use of the siaple present and the progressive torn In Enalish 

is allowed to be on fwalllar tenas In German. This axollclt 

l^SS 1 ^ l % " p£ond1tio„ 

language proficiency as behavioural ability seem to be possible 

J J&tJFl* toowled * underlies language 

iZ C^ll^ V^i ^^ iC<Ju1rrt «Pl1cU knowledge. 
In the following I will discuss these approaches successively. 
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1.1 Evaluation i>y Mans of authentic languagt bthavlour 

Evaluation of lanouagt proflcltncy t% a bthavlourtl ability 
by attns of authtntlc languagt bthavlour tnttlls a atthod which 1s 
txtrtatly fact-valid. If you want to know If a parson can takt part 
1n a foreign ltngutgt convocation, you aakt hla or htr partlclpaU 
In 1t. If you wont to know If ht or sht ft ablt to road a taxt of a 
sptclfic typo, you put qutstlons to Ma that Use Ms coaprthaaslon 
of tha taxt. or, alttrnatlvtly. you Mkt htr wrlU a prfcls. 
As a rult, thtrt will bo no objections to such an approach. It Is 
cloar and obvious that tho proctdurt Mkts sonsa. Add to this that 
tho «tthod totalis a posltlvo bsckwtsh tfftct: as a rtsult ol tho 
way of ttttlao aatttrs art tralnod In school that art actually 
required owtslda school. 

Howtvtr. whtn tho tasting txptrt coats m a nuabtr v aas 
will aiist that Incrtasts tha aoro tht languaga bona vl our 
observation rtstablts tht languaga bthavlour that wt attt 1 
prtctlcn. mam btlngs usually do not road ttxts In ordtr tv %xmr 
a find stt tf reading cavyrebaasltn questions, but In ordtr U 
rtallxt sptclfic goals that can vary with tht Individual and tht 
situation. Thtlr rtadlng coapreftanslon can txprtss Itstlf In various 
ways. Distinct asptcts of tat taxt aay bt rtltwwit for aoaa, but 
not for othtrs. 

Howtvtr. testing will rtqulrt a ctrtaln taount of studo rota- 
tion. A convtrsttlon which Is truly frtt and opto Is known for tht 
sort of tvaluatlonal problaas that art rtlattd to tht reliability 
of rating, awarding tht frtt convtrsttlon thtrt Is ytt tnothtr 
sourct of i*rtl1 ability » that Is: tht ttsttts thtaatlvts. Not 
tvtryont will bt ablt to spaak just as atsl'y about any topic. 
Oat shogld consldtr factors as aaaathv t*d afftctlvt thresholds, 
factors taat htvt vary llttlt to do with language proficiency 
(IMtrttlll 1SB3). 

Aaothtr proMaa Is that of conttnt validity: to what txttftt 
can obstrvttlons cf incidental languaga bthavlour provldt us a 
rtlltalt tad oaeplatt Interest Ion about tha ttsttt v s behavioural 
ability, that tha tetter is to jwfct eventually? 

1.2 Evaluation ay aatns of tasks base* on iapllclt knowltdgt ] : 



Sttclflc Usks btstd on capabllltlta that art llnfcad up with 
lapllclt fcetwlaeta of a lang u at t can at derived froa tred1t1o*al 
gtatrttlvt linguistics. This discipline optrttas on tht basis at 



tht caatapt tf caapttanct. Ungulitlc coaptttact Is gantrally 
define! * tat Individual's knowltdgt of tht structurt of his 
lMptap. Oa tht buls of this knowltdgt t attlvt spatter Is tali 
to atkt jwdaatttts abawt his lanfagt tystai resulting froa Ma 
•tret Itngwmfc iptuitlons 1 . Ht ft ablt to dlscrwjiaatt beta** 
will-IWtad tad ntt wtll-foretd stnttactt and to distinguish 
atmtlctl differences tad start Urltlei 1n sentences that btltag 
tn Ms languaga. ^ 

Hall totes tjcpanatd tht Owjatkytn oonctpt of linguistic 
coaptttnet to coaaunlcttlvt cwaptttnet. Ooaptttnct rtaalns 
underlying knowltdgt, but Is txntndtd to Includt all tht *sptct* 
tf knowltdgt that affect ceeeunlcativt bthavlour. On tht basis tf 
this knowltdgt t natlvt spaaktr Is jblt to judgt not only tho 
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From a theoretical point of view the concept of communicate 
competence is too hybrid, although the concept of linguistic 
competence cannot be Mid to be theoretically uncomplicated, either. 
From a practical point of view, hrwever. the concept of competence 
gives support to many traditional languat* tests. 

The main question Is how test data obtained by means of Hams 
fc*s*J on such tasks, relate to the behavioural ability that we tie at. 

It can be hypothesized that the attained level of perform*** 
on these tasks gives us an Indication of someone's language 
proficiency as behevlourel ability. To put 1t in a different way: 
the obtained performance data will represent some measure of language 
proficiency, 

A testing procmduit which 1s based on tasks that a testee must 
be able to perform in virtue of his competence shares the problem 
of content validity with the evaluation by means of authentic 
language behaviour. Every testing procedure has to confine itself 
to a sample of possible performer-*. It is necessary to indicate or 
*Q test empirical u to whet extent such a sample Is representative. 

The hypothesis that testing based on tasks fitting the concept 
of competence en one hand end J* evaluation of authentic language 
behaviour on the other hend will yield equivalent information about 
language proficiency as behavioural ability, can be tested 
empirically. This can be done by determining the concurrent 
validity of two alternative testforms. Clark (1972) found* for 
example, correlations from .82-, 92 between the Ffl-lnterview e*s 
a battery of objective tests for vocabulary tnd structures. 

Tests consisting of tasks ttet the testee must be a*le U> 
fmrtom because of Ms implicit knowledge of a language, have the 
advantage of rhe possibility of an objective form. An exclusive use 
of this type of test in instructional settings, however, wiH 
inevitably c*ny with it the disadvantage of an undesirable 
backwash affect, unless particular measures concerning the 
curriculum have been provided against this effect. Otherwise 
educational activitte* will focus on learning tasks of an abstract 
nature and training of the actual use of the language will be 
neglected. 

}.j Evaluation by testing wcpHcit knowledge 

In respect to the form of testing that focuses or. explicit 
language knowledge we can be brief. As early as half a century ago 
it *tt stated that knowing about a language and knowing of a 
language are two different HffiSs. Explicit knowledge orFule* is 
neither a necessary, nor a sufficient condition for a successfMl 
and all-round language use, The first 1s proven by native spatters 
of a language did not receive any formal schooling in linguistics, 
the second by grammar school pupils who. though having a thorough 
command of explicit ruins, omot speak a word of Latin or Greek. 
A toeching process that prepares for this kind of testing should be 
considered &* philology, rather than as language teaching. 



9 

ERLC 



8EST COPY AVAILABLE 

5 



-182- 



& Test functions 



ir.eor* s i C a! background of test itm% are the lain tonics ..nrf.r 
discussion. They mistakenly call a set of tesf te« a"«f 
, test 1s essentially . dfffi^i tMng, ' 

a .'1*2? OTIt "»' »"» <i"™TpSS «Tu2 » «,„««, 

^wever, a clln.Mi thermeter Is only fit 1f It orovid*< 
Infection that anebles the user to decide If he o7lnf L.nn 
stay In bod, sl^ld co.sult a <to-.tor,lhoald take Llcine^or h« 
reached , ptrlod of fertility ■*"ciner or has 
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The concrete test I would like to discuss meant to support a 
proum of Mking decisions of tht first kind, The testae* were 
groups of teachers rte hid coat fror various countries: Turkey, 
terocco, Sp«1fi» Italy. Jugoslavia, Portugal, 6ree<* and Tunesla, 
to teach their native Itnguagt a«d cultu^a to croups of ctilldHm 
with the sat* cultural background In primary schiols. Tht treatment 
offerrt was a special coursa that would make them fully qualified 
teachers In primary schools. It was oosarvod that thin teecbtrs 
were not Integrated Into tht teaching staffs, mainly because of tht 
fact that thaw wart not Qualified to teach any othtr subject hut 
thtlr native laneuagt and culture, neither were thty allow*) to 
ttach othtr pupils tht* those ef thtlr own culture. In ordtr to 
further the Integration and to widen the scope of these teachers 
the course would give then the opportunity to acquire a full 
qualification. The criterion In the flowchart would consequently 
cover the objectives of a reoular college of education. 

To set an entrance level it was required that the candidate 
ambers of the www had a satisfactory oral eoumand of the 
Dutch larguagt so that they could coemnfcLte rather fluently. 

Gn the authority of tht inspectorate this <naqu1 recent *at 
tightened up, To a certain extent count nd of the Dutch language was 
required at the level of pupils In the list for* of Mgj*r level 
secondary schools. The Inspectorate had its reasons for this: the 
court* would only take two yet rs and after finishing It the* 
teacher* would he fotmtlly qualified to teach the Dutch language 
to Dutch children in pHeery schools! 

Or Hm fct*H of these requirements globe* selections! criteria 
w# re set. 

C1to, that is the Dutch national Institute of E<h#catiencl 
ftteitfrwaent, was charged with their optret1rn*l1z*t1o«. 




3 Choice and dtv* lopuont of the Instrument 

•ith tfce theoretical com 1 derations pui forward at the ginning 
of this paper in Kind It was trite to obtsin information about the 
language proficiency of the t as tees 1n two different ways: firstly 
fey eliciting and evaluating authentic language behaviour and 
**eo*d!y by *<tf ft titrating item that weald meesure the underlying 
Elicit knowledge. 9 



B^T COPY AVAILABLE 



-185- 



3,1 The oral proficiency subtts* 

For the evaluation of lingua?* behaviour the fG^owif>9 type of 
oral proficiency test *as designed. It consists of about IS stimuli 
requiring the testes* to play a cartel* part in a situational 
dialogue. The sttoull are presented verbally. In anst of the cases 
a response is promoted by mits of a picture, for example: 

AT THE POLICE 5TATI0* 

Om <Uy you discover your wallet is missing. You ga to the police 
station. 

Good nornlng, an officer sa>$. What can I do for you? /.../ Could 
you tan na what was in your yjTgf? (picture U /.../ Have yoi any 
idea shore you 'covflThave lost S5SF~ wallott (picture 2)77../ etcetera, 

These stiauli are followed by drca 1$ ge»)rel guestloft* about their 
life in Holland. 

The response* ware each Judged on 1*te1!1f1»1V»ty/af9r*pHat*ne$s 
and correctness. To each response a maximum of C points could be 
assigned. The maximum meant that a response was perfect: appropriate 
in the context liven, perfectly Intel iifible and completely correct 
in i fcreemattcal sense. 5 poi* % Mere assigned 1f the response was 
intangible end appropriate, a 1 though some miner mistakes were made. 
If it took some effort to un d e rstand the response es * result of 
certain leeectant lexical or grammatical midtemes » 3 points were 
sssioned. If a response was either not forthcoming or eat 
intelligible or not appropriate et all, 1 point was etsiined. In 
all cases one point mm deducted If repetition of the stimulus 
wocQOSSfy* 

In this rating system a sue total of 160 points with 30 items means 
thet all response* are parfmct: 1nte'11g1bl*» appropriate and 
ccmpletely correct. There 1s a rapid de c re a se 1n degree of 
(orrtctness: a scare of ISO points meami that the a vereg a response 
1s appropriate and Intelligible, but no More thee thet. Vlth lower 
scores a general appropriateness and Intelligibility Is preserved 
0f>r 6 relative long period. Understanding tabes sonw) effort with 
scores talow 140. With a score of 110 or HO points eeverr" 
m sunderaUndings win #*ito and cammmtlcetigm will ttert to break 
dnwm, 

T«st1nc proceeure* like this one beted on ew t hee ^ lenguage 
behaviour, heve Urn additional mdvea*ftm» of enebMae) ms to apply 
natural, thet is to any. oommo* sense criteria When suvoff-scores 
have to be determined. Criteria such eg *070l commend of Dutch on 
a noar~na ive level 1 or 'being able to a»Ac omeoelf understood' can 
almost temeoiately be translated into a score en the Wei 
proficiency test. An extmple cew be token freer the retina system of 
t:,« FSf -interview: a neturel criterion related to overt lenguage 
behaviour corresponds directly he a specific test score. 

In the case of the teacher and the test mantioneu above the 
rjK^lreewjrt of oral proficiency oas op h i otto aoaMttd ea a sufficient 
regret of aoeroprlitenees end intelligibility 1ft OJ*> expres sion 
3i neexured by the tent, with seme safety emeffnrJbt Cmt-oVf- score 
m% set m points. * *» 
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3.2 Testing iaplkit knowledge 

subtest focused o^^K**!*^ 1 ? ,te ?? 1n the $K0 "° 
*>rds. This plrt leewd ?L Ch * n ??• b '• ,Bd »«ch*"fl«b;e 

)« Dutch. ThS&Su £££ Mfcf!. q Kl!' of ,u,p ««y t«W«l 

«»d<cetor of lenouege profl^i ?K!\ff^ ,, T t " t an 
OOM test In wJti2?e!cho c^teLl r^J" 1 Subt,,t of • 

trenscriptlons of XHSl W^Jh I Jt! 8 *"" uke " fr » 

extractors i^obtalnK !2L 0 f ?th " rd 
the test irtth ISrJS un^^stSn 4 Str S? ? 0p,B ""^ of 
underlying knowledge ««jur*d hJ JJu J*" *l th r * 91rd 10 °* 
that *1de?v dlftwZt^K 2r?2»X- V" * 
fluency the test reoulrJTfrn. *k ; n¥ * ,vtd - *P«rt fro« lexical 
««Snt »tBl^"^iJ5i5lB*2» < th " th *> could proem 
context. '09ice-s«eat1cal Information In tlw 

3-3 Settle, . cut-.ff-.cor. for test, be,* o* tap „ c , t 

taSSAT. "S-lff-Sor, 1 !,: SPeST ^ «• 

linguistic end ca-unlcetlve^LlJJJ!' M*?* ,r " fBCU, « d « 
forwjl.te sue. e^rHeHon^nlZ^^'^i* \^ " 0,,1d1 « •» 

wturelcrlUrU the settlno^f^^S 0 tho *- r ° r '«* of 

«*itrary, t *fi) sAefcnfc^lIi? °? t f?* 1s «*«t1.11y 
According to the flSflSL^^ ****** proced,*^. 
torn. After tH ad^ltE^A?"*? ° f *•»*»» Its ow, 

of «m wr4«t ?hi sLld^J . U1 M *! mn *-"» error 

on the te,?Th1 TtJTd % m l^XEL? not f 1Ml > 5 « Ptrft»ance 
cmswius. or followlnTtf Zm foJ^^J^',^ d1,cu$ »<«" «"o 
of Hedelsky. Angor^ or f^^t) kJj^ 1 ^ It ' M ^ 1 « t*t 

saw fcHrrr &tt=S! 

ssj s sSSr-- - - - 

3.4 The choice of a refer*** poo.J-;on 

To jive en if^,^"^i^ " ,,r !*" W"l«*-*». 

popuutioe ssr« ^(^'ScWSj;;,; jw : 
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succesful foreign university students, this nor* can be obtained 
by pretesting. 

This solution presented itself with the test for the foreign 
teachers, ! rewind you of the feet that not only a functional oral 
comnd of the second language was required, but also toe* practical 
knowledge of Dutch on the 'evel of pupils 1n the last fom of 
hloher level general secondary schools. These are five- fom schools 
following prlsary education for pupils between the ages ef 12 and 1/. 
These provided us with a clear-cut reference population. 

The Intended nom was defined *s the score that anyone could 
reach who was good trough to reach the last forau At first sight 
f ndlng this nam seeawd very eesy. However, we wart confronted 
with a slight amplication. 

Pretesting would depend on voluntary participation. The test 
would not have any consequences for the pretest population. Thus 
there was a change that torn pupils would take It less seriously. 
Apart frow sabotage we had to consider the fact that one pupil or 
Uv, other would have an off-toy. For that reason It wee rather 
^ngerous to define the none as the lowest score of the selected 
pretest population. Therefore It seeawd reasonable to Uke that 
score as a none that was two standard deviations below the swan 
score. With the expected distribution of scores this nona would be 
reached by clTa 95 percent of the pupils, 
A pretest was carried out with the foil owing results: 

RESULTS Of TKt l€FE»OC£ PONTIC* (1583) 
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J8t?0 






vccsSkilfiry and si rye turn* 
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.94 


w 




111 


.86 


,89 




e!or« 


249 




.91 


4,4 



The high reliability coefficients for orthography and c?o*« gave 
food for thought. It sight very well have bees that %om pupil? uid 
not do their utmost. The 1te»~ana1ys1s of the cloze test revealed 
that several Item were sMpped at the end, that 1s f no answers 
*ere g1»en. if this occurs systematically, that 1$, If the saaw 
testaes skipped each of these Item, reliability it flattered, 
f^ttst results of a slullar test 1n 1984 are ettre mllfttlt fro* 
this point of view: 

ttSULTS Of THE ftOWJCE POPULATION {1984} 
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Still* reliability of the cloze test 1s qu1t« high, in view of the 
1^>ress1ve percentage of correct answers However, extreatly low 
scores did not occur anymore. 

AfUr the pretesting of the reference population norw were set 
?el lowing tne procedure described, 

4 Itstrtsults 

fctalMstretlcn of the test with 87 iwtsigrant te*ch«r* yielded 
in* fWewlftf refultss 





rt» <s«6€ r of 








1t«* 






vocabulary tnd structures 


10 


.76 




orthography 


20 


,86 


.56 


clan 


100 




,52 


spaaM**; P*"*' 


30 


.06 


.67 



Obviously the speaking proficiency U*t was e*jch easier than the 
g titer subtests. No wonder* In view of the fact tnat only a ftaslc 
Interpersonal Coe»wn1 cation SMi: {Cuwlns 197$) was Involved, 
w1t*mt tne requirement of native tpeeker proficiency. 
The rating of tt« *1 letted response* on the speaking proficiency 
test was carried wit »y routine pairs of raters. Each rater 
aisloned Ms scares Independently. P» l^"*^^ 1 * 1 "* . 
was surprisingly high. To a certain deeree this could be explained 
by the fcateroflenlty ef Ue Us* population. 

pairs of raurs R 

A/C .Ml 
S/D .*? 

i/c .si 

lh« f©!l9*f1n9 car?-/iclM coefficient* between ifta various *4&test* 
found; 



cl©*# * veceo, & struct, 78 

cleie x ©nhefcrephy -7* 

cloze x speaking prof. - 7 * 

wzib. | struct, x orthography .70 
ioeaktng prof, x yocaa. 4 struct. 

i pea*1ng prof, x prtao^rs^y »o§ 
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5 Conclusions and dlsojsslor 

the bas<s of these eodest results we could not reject the 
hypothesis that the mm su resent of language proficiency can be 
done by evaluation of language behaviour as well as by the testing 
of lapllclt knowledge. Instruments of both type* do provide 
equivalent results. How thes* instruments relate to each other may 
appear froK the following survey: 

9 candidates with a sufficient score on the speaking proficiency 
test passed the vocabulary end structures subtest as well as the 
cloze test. Only one of the candidates storliw Just below the 
cut-off-score of the speaking profJcUncy test (scores rouging fro* 
140 to m points) passed both the vocabulary as* strwtures subte c 
and the close test. Another candidate passed the cloze test only. 
Below 140 points just one candidate passed both subtests. If we had 
based our selection decisions exclusively on Hie subtests vocabulary 
and structures end the cloze test, only tbne eJscless 1 flea t Ions 
would have resulted In that three candidates out of 67 with an 
insufficient speaking proficiency would have pes**d the tort. 

With regard to the possibility of elsclessWcatlons the 
results of f <Sm subtest orthography were core problematic. Of the 
cadi dates that passed the speaking proficiency test 16 passed the 
orthography test as well, but aasong the candidates just below the 
cut-off-score there still were four of them that pissed the 
orthography test, and with lower scores there even were tlx.' 

It Is obvious that orthography Is Unite* up with language 
knowledge, but to a large extent It can be acq j1 red es an Isolated 
system, especially If It Is confined to general rules* Those who 
passed the orthography subtest with e deficient speaking proficiency, 
eight have acquired the'r knowledge by their unflagging energy in 
language classes. 

This Illustrates the necessity to take the factor language 
background Into consideration when testing language proficiency 
{Cztko 1984). language background refers to the type of contact 
the testees have had with te second language and the opportunity 
they heve had for acquiring the various aspects of the language. 
If this language background consists of a language coarse where 
writing skills are emphasized, language proficiency aright be 
flattered If these skills are audi represented 1m the evaluation 
procedure, If » on the other hand, the various aspects of a 
urwuagt are trained 1n a wore balanced way, distinct subtests 
win represent language proficiency es a whole nttre adequately. 
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